Utilizing Untranscribed User Utterances for Improving Language Models based on Confidence Scoring

نویسندگان

Mikio Nakano

Timothy J. Hazen

چکیده

This paper presents a method for reducing the effort of transcribing user utterances to develop language models for conversational speech recognition when a small number of transcribed and a large number of untranscribed utterances are available. The recognition hypotheses for untranscribed utterances are classified according to their confidence scores such that hypotheses with high confidence are used to enhance language model training. The utterances that receive low confidence can be scheduled to be manually transcribed first to improve the language model. The results of experiments using automatic transcription of the untranscribed user utterances show the proposed methods are effective in achieving improvements in recognition accuracy while reducing the effort required from manual transcription.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using untranscribed user utterances for improving language models based on confidence scoring

متن کامل

Confidence Scoring for Speech Understanding Systems1

This research investigates the use of utterance-level features for confidence scoring. Confidence scores are used to accept or reject user utterances in our conversational weather information system [10]. We have developed an automatic labeling algorithm based on a semantic frame comparison between recognized and transcribed orthographies. We explore recognition-based features along with semant...

متن کامل

Active and unsupervised learning for automatic speech recognition

State-of-the-art speech recognition systems are trained using human transcriptions of speech utterances. In this paper, we describe a method to combine active and unsupervised learning for automatic speech recognition (ASR). The goal is to minimize the human supervision for training acoustic and language models and to maximize the performance given the transcribed and untranscribed data. Active...

متن کامل

Active and Unsupervised Learning for A

متن کامل

A Comparative Study on the Effect of the Formative Use of Confidence-Based Scoring and Conventional Scoring on Iranian EFL Learners’ Grammar Improvement

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Utilizing Untranscribed User Utterances for Improving Language Models based on Confidence Scoring

نویسندگان

چکیده

منابع مشابه

Using untranscribed user utterances for improving language models based on confidence scoring

Confidence Scoring for Speech Understanding Systems1

Active and unsupervised learning for automatic speech recognition

Active and Unsupervised Learning for A

A Comparative Study on the Effect of the Formative Use of Confidence-Based Scoring and Conventional Scoring on Iranian EFL Learners’ Grammar Improvement

عنوان ژورنال:

اشتراک گذاری